Matrix Multiplication on Three Heterogeneous Processors
نویسندگان
چکیده
We present a new algorithm specifically designed to perform matrix multiplication on three heterogeneous processors. This algorithm is an extension of the ‘square-corner’ algorithm designed for two-processor architectures [2]. For three processors, this algorithm partitions data in a way which on a fully-connected network minimizes the total volume of communication (TVC) between the processors resulting in lower execution times for a defined range of processor power ratios, when compared to existing partitionings. On a nonwraparound linear array where the fastest processor is the middle node, this algorithm always results in a lower TVC. Minimizing the TVC is a natural goal as matrix multiplication involves substantial communication volumes, and the links connecting the processors are possible bottlenecks. The goal of this paper is to determine if the square-corner algorithm of [2] can be successfully extended to three processors.
منابع مشابه
Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors
Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of ...
متن کاملData Allocation Strategies for Dense Linear Algebra Kernels on Heterogeneous Two-dimensional Grids
We study the implementation of dense linear algebra computations, such as matrix multiplication and linear system solvers, on two-dimensional (2D) grids of heterogeneous processors. For these operations, 2D-grids are the key to scalability and eÆciency. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these opera...
متن کاملTwo-Dimensional Matrix Partitioning for Parallel Computing on Heterogeneous Processors Based on Their Functional Performance Models
The functional performance model (FPM) of heterogeneous processors has proven to be more realistic than the traditional models because it integrates many important features of heterogeneous processors such as the processor heterogeneity, the heterogeneity of memory structure, and the effects of paging. Optimal 1D matrix partitioning algorithms employing FPMs of heterogeneous processors are alre...
متن کاملA Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors
General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse m...
متن کاملOptimal Data Partitioning Shape for Matrix Multiplication on Three Fully Connected Heterogeneous Processors
Parallel Matrix Matrix Multiplication (MMM) is used in scientific codes across many disciplines. While it has been widely studied how to optimally divide MMM among homogenous compute nodes, the optimal solution for heterogeneous systems remains an open problem. Dividing MMM across multiple processors or clusters requires consideration of the performance characteristics of both the computation a...
متن کامل